Department of Statistical Science, UCL
Resources
Slides and code here: github.com/n8thangreen/data-science-in-health-talk
Health literacy is broadly defined as the ability to access, understand, appraise, and communicate health information, enabling individuals to engage in healthcare and maintain good health throughout their lives.
UCL Public Policy Fellowship
Newham Residents Survey 2023 (NRS)
Skills for Life Survey 2011
Additional data
The predicted probability \(\hat{\pi}_i\) is defined as: \[ \hat{\pi}_i = \text{logit}^{-1} \left( \hat{\beta}_0 + \sum_{x} \hat{\beta}^{x}_{\gamma_x[i]} \right) \]
where \(\hat{\beta}_0\) is the intercept, \(\hat{\beta}^{x}_{\gamma_x[i]}\) are coefficients for covariates \(x\) (age, sex, eng, white, ukborn, qual, inc, job, work, home), and \(\gamma_x[i]\) represents the level or category for covariate \(x\) for individual \(i\). IMD is included as multilevel random effects \(\beta^{\text{IMD}}_j \sim \text{N}(\mu_{\text{IMD}}, \sigma_{\text{IMD}}^2)\). Priors distributions for fixed effects are normal distributions centered at zero with modest variance, and half-normal priors are used for random effect standard deviations .
The health literacy probabilities for each demographic category (cell \(c\)) are weighted by their proportion in the actual Newham population. With 11 covariates resulting in \(|\mathcal{S}|\) = 13,824 cells, the post-stratified estimate \(\hat{\pi}^{\text{mrp}}\) is: \[ \hat{\pi}^{\text{mrp}} = \sum_{c = 1}^{|\mathcal{S}|} w_c \hat{\pi}_{c} \] where \(\mathcal{S}\) is the set of all covariate combinations, \(N_c\) is the population frequency for cell \(c\), \(N\) is the total population size, and \(w_c = N_{c} / N\) are the combination weights.
\[ \delta_u(u^{(1)}, u^{(2)}) = \frac{E(y \mid u^{(2)}) - E(y \mid u^{(1)})}{u^{(2)} - u^{(1)}} \]
The Goal: Adjust survey weights (\(w\)) so that the sample distribution matches known population control totals (margins).
Let \(w_{ij}^{(t)}\) be the weight for cell \((i, j)\) at iteration \(t\).
We have Target Margins:
Initially (\(t=0\)), the sample sums do not match the population targets:
\[ \sum_{j} w_{ij}^{(0)} \neq R_i \]
\[ \sum_{i} w_{ij}^{(0)} \neq C_j \]
The algorithm alternates between adjusting rows and columns until convergence.
Step 1: Row Raking (Match Row Targets) \[ w_{ij}^{(t+1/2)} = w_{ij}^{(t)} \times \frac{R_i}{\sum_{k} w_{ik}^{(t)}} \]
Step 2: Column Raking (Match Column Targets) \[ w_{ij}^{(t+1)} = w_{ij}^{(t+1/2)} \times \frac{C_j}{\sum_{k} w_{kj}^{(t+1/2)}} \]
Convergence Repeat until \(\left| \sum w - \text{Target} \right| < \epsilon\).
To summarize these probabilistic rankings, we adopt the metric, common in multiple-treatment meta-analysis . SUCRA represents the percentage of the maximum possible cumulative rank an intervention (in our case, an input variable) can achieve, providing a single value where a higher SUCRA indicates a better overall rank relative to others. For our model, it is given by the following \[ \text{SUCRA}_{ij} = \sum_{r=1}^{n-1} P_{ijr} / (n-1), \] where \(P_{ijr}\) is the cumulative probability for variable \(i\) at level \(j\) and rank \(r\). The mean rank is \[ \mathbb{E}[\text{rank}(i,j)] = n - \sum_{r=1}^{n-1} P_{ijr}. \]
Reference
Green, N., Kurt, M., Moshyk, A., Larkin, J. and Baio, G. (2025), A Bayesian Hierarchical Mixture Cure Modelling Framework to Utilize Multiple Survival Datasets for Long-Term Survivorship Estimates: A Case Study From Previously Untreated Metastatic Melanoma. Statistics in Medicine, 44: e70132. https://doi.org/10.1002/sim.70132
Nathan Green | UCL | n.green@ucl.ac.uk